Assessing the Risk of Disclosure of Confidential Categorical Data
نویسندگان
چکیده
Disclosure limitation involves the application of statistical tools to limit the identification of information on individuals (and enterprises) included as part of statistical data bases such as censuses and sample surveys. We outline the major issues involved in assessing disclosure risk and assuring the protection of confidentiality for data bases, especially those in the form of multi-way contingency tables, and we present a Bayesian framework for thinking about such problems both from the perspective of an intruder and the agency trying to protect its data.
منابع مشابه
Disclosure Risk Evaluation for Fully Synthetic Categorical Data
We present an approach for evaluating disclosure risks for fully synthetic categorical data. The basic idea is to compute probability distributions of unknown confidential data values given the synthetic data and assumptions about intruder knowledge. We use a “worst-case” scenario of an intruder knowing all but one of the records in the confidential data. To create the synthetic data, we use a ...
متن کاملIndividual Disclosure Risk Measures Based on Log-Linear Models
Dissemination of microdata files should be constrained to the confidentiality pledge under which a statistical agency collects survey data. To protect the confidentiality of respondents, statistical agencies perform a two-stage statistical disclosure control procedure. In the first stage, with respect to a disclosure scenario, the risk of disclosure of each unit is estimated. After the removal ...
متن کاملBayesian Disclosure Risk Assessment: Predicting Small Frequencies in Contingency Tables
We propose an approach for assessing the risk of individual identification in the release of categorical data. This requires the accurate calculation of predictive probabilities for those cells in a contingency table which have small sample frequencies, making the problem somewhat different from usual contingency table estimation, where interest is generally focussed on regions of high probabil...
متن کاملA CRONYM : Data without Boundaries D
Disclosure limitation methods for protecting the confidentiality ofrespondents in survey microdata often use perturbative techniques whichintroduce measurement error into the categorical identifying variables. Inaddition, the data itself will often have measurement errors commonly arisingfrom survey processes. There is a need for valid and practical ways to assess theprotect...
متن کاملThe Security of Confidential Numerical Data in Databases
O rganizations are storing large amounts of data in databases for data mining and other types of analysis. Some of this data is considered confidential and has to be protected from disclosure. When access to individual values of confidential numerical data in the database is prevented, disclosure may occur when a snooper uses linear models to predict individual values of confidential attributes...
متن کامل